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SUMMARY 

Tht  questions  we  shall  discuss  in  what  follows  belong  to 
two  fit Ids  which  formerly  wen  quits  disjoint,  ths  classical 
theory  of  probability  and  the  classical  calculus  of  variations. 
That  thsrs  is  now  considerable  overlap  is  due  to  ths  rise  in 
scientific  interest  in  the  field  of  control  processes.  Al¬ 
though  it  is  only  within  the  last  few  years  that  the  theory 
of  feedback  control  has  penetrated  the  academic  curriculum  and 
become  a  respectable  member  of  the  mathematical  community*  the 
conventional  formulation  is  already  far  outmoded.  In  order  to 
treat  current  and  future  problems  of  any  significance,  it  is 
absolutely  essential  to  introduce  stochastic  elements.  These, 
however,  enter  in  entirely  novel  ways,  not  in  the  fairly  well 
understood  fashion  of  conventional  stochastic  processes,  but 
in  connection  with  "learning  processes, "  or,  as  we  shall 
henceforth  sav,  adaptive  processes . 

In  what  follows  we  show  how  the  functional  equation  tech¬ 
nique  of  dynamic  programing  can  be  used  to  treat  adaptive 
control  processes,  and  how  continuous  processes  can  be  defined 
in  terms  of  the  discrete  versions. 
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A  MATHEMATICAL  FORMULATION  OF  VARIATIONAL  PROCESSES 

OF  ADAPTIVE  TYPE 

Richard  Bellman 


1.  Introduction 

The  questions  we  shall  discuss  In  what  follows  belong  to 
two  fields  which  formerly  were  quite  disjoint,  the  classical 
theory  of  probability  and  the  classical  calculus  of  variations. 
That  there  Is  now  considerable  overlap  is  due  to  the  rise  in 
scientific  interest  in  the  field  of  control  processes. 

Although  It  Is  only  within  the  la>t  few  years  that  the  theory 
of  feedback  control  has  penetrated  the  academic  curriculum  and 
become  a  respectable  member  of  the  mathematical  community,  the 
conventional  formulation  is  already  far  outmoded.  In  order  to 
treat  current  and  future  problems  of  any  significance  it  is 
absolutely  essential  to  introduce  stochastic  elements.  These, 
however,  enter  In  entirely  novel  ways,  not  in  the  fairly  well 
understood  fashion  of  conventional  stochastic  processes,  but 
In  connection  with  "learning  processes,"  cf.  [2]. 
or,  as  we  shall  henceforth  say,  adaptive  processes . 

In  order  to  prepare  a  suitable  background  for  the  intro¬ 
duction  of  the  new  features,  let  us  review  the  elementary  ideas 
of  feedback  control  processes.  We  are,  of  course,  here 
Interested  only  In  the  mathematical  presentation  of  these  con¬ 
cepts,  and  shall  ignore  any  of  the  difficulties  of  engineering 
or  statistical  application.  * 

One  version  of  the  feedback  control  problem  Is  that  of 
maximizing  a  functional  of  the  form 
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(1)  J(y)  -t/;T  F(x,y)dt 

over  all  functions  y(t),  where  x  and  y  are  connected  by 
means  of  a  dl f ferentlal  equation 

(2)  $£  •  °(x#y )>  *(0)  -  c, 

and  y  may  be  subject  to  further  constraints  which.  In  general, 
will  depend  upon  x(t). 

Although  problems  of  this  genre  appear  to  belong  In  u  very 
natural  way  to  the  calculus  of  variations,  and  thus  to  be 
susceptible  to  classical  techniques,  they  are  often  more 
advantageously  treated  by  means  of  the  theory  of  dynamic  pro¬ 
gramming  [l]  ,  [2 j  .  It  turns  out  to  be  convenient  from  many 
points  of  view,  conceptual,  analytic,  and  computational,  to 
consider  a  discrete  version  of  the  foregoing  problem. 

Let  us  agree  to  maximize  the  function 

N 

(3)  Jj^(y )  -  £  ?(*k,yk) 

k*=0 

where 

Vn  ■  *k  +  0(VV-  xo  -  °> 

over  the  set  of  y:  y^^ #y2»  •  •  •  #yN»  with,  as  above,  possibly 
some  constraints  present.  Not  only  are  problems  posed  in  this 
form  much  more  amenable  to  the  application  of  digital  computers, 
but,  what  Is  often  forgotten,  they  frequently  represent  more 
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realistic  description*  of  the  original  physical  process. 

So  far,  everything  has  been  very  deterministic.  Let  us 
now  introduce  btochastic  elements.  In  place  of  the  transforma¬ 
tion  of  (2),  let  us  suppose  that  x^+1  is  obtained  by  means 
of  a  stochastic  transformation 

(5)  **  -  'k  +  0<xk'yk-rk)'  x0  ’  c- 

In  place  of  the  original  maximisation  problem,  let  us  consider 
the  problem  of  maximizing  the  expected  value  of  the  function 

N 

(6)  JN(y)  »  I  p(xk>yk*rk)- 

k«C 

At  the  moment,  we  take  the  r^  to  represent  a  sequence  of 
independent  random  variables,  and  the  yk  are  to  be  chosen  in 
feedback  fashion.  By  this  we  mean  that  yk  is  chosen  with 

knowledge  of  x0,*1#  •  •  •  ,xlc_1,  y0,yl' *  ’  * 'yk-l#  r0'  rl ' ' ' '  ,rk-l' 
but  not  of  r^,  nor  ol  any  of  the  following  x's,  y's,  or  r's. 

In  [3j  we  discussed  in  some  detail  the  use  of  the 
functional  equation  techniques  of  dynamic  programming  to  treat 
optimization  problems  of  this  nature.  Our  emphasis  there  was 
upon  the  use  of  discrete  processes  to  lay  a  foundation  for  the 
rigorous  formulation  of  continuous  processes. 

In  this  paper,  we  wish  to  discuss  corresponding  problems 
arising  in  cases  in  which  the  distribution  functions  for  the 
are  only  partially  known.  The  problems  we  discuss  here 
represent  only  a  small  part  of  the  cornucopia  of  questions 
which  the  theory  of  feedback  control  thrusts  upon  us.  In  a 
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series  of  papers  with  R.  Kalaba  [4,  5  »  6  ,  7  ,  8]  ,  we  have  laid 
a  foundation  for  the  study  of  such  questions. 

2.  Multistage  Decision  Processes 

To  treat  the  optimization  problems  described  in  the  pre¬ 
ceding  paragraph,  as  well  as  those  of  more  complex  nature,  we 
use  the  concept  of  a  multistage  decision  process.  Let  p  be 
a  point  In  a  space  S  and  T(p,q)  a  set  of  transformations, 
defined  for  all  p  e  S  and  q  £  a  second  space,  with  the 

property  that  T(p,q)  «  S  for  all  p  €  S  and  q  S^. 

Starting  with  a  point  p^,  a  choice  of  q^  Is  made, 
leading  to  a  new  point  p2  «*  T(p1,q1).  Repeating  the  process, 
a  choice  of  q2  leads  to  a  third  point  °  T(p2,q2),  and 
so  on.  The  set  of  q's,  #q2»  •  •  •  U  Is  called  a  policy, 

and  the  process  itself  Is  called  a  multistage  decision  process. 

Let  us  now  suppose  that  the  q^  are  to  be  chosen  so  as  to 
maximise  a  preassigned  criterion  function 

(1)  F(p1,p2#. . .,pN;q1,q2,...,qN). 

A  policy  which  maximizes  is  called  an  optimal  policy . 

Since  the  problem  of  determining  optimal  policies  In  this 
generality  is  much  too  difficult,  let  us  restrict  ourselves  to 
the  case  where  F  is  separable, 

(2)  ?N  -  R(p  ,q1)  ♦  R(p2,q2)  +  •••  ♦  R(pN»qN). 

Fortunately,  In  many  significant  applications,  F  can  be  taken 


to  have  this  form. 
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The  case  where  only  the  term  appears  Is  called 

terminal  control  In  engineering  circles.  If  the  number  of 
stages,  H,  is  Itself  a  function  cf  the  sequence  of  states 
and  decisions,  we  speak  of  an  Implicit  variational  problem. 

The  basic  problem  is  that  of  determining  optimal  policies 
and  the  value  of  the  maximum  of  F. 


3.  Functional  Equation  Approach 

For  a  variety  of  reasons  which  we  shall  not  enter  into, 
conventional  methods  of  calculus  are  seldom  operative  by  them 
selves.  Let  us  Introduce  the  sequence  of  functions  |f 
defined  by  the  relation 


(1) 


fN(p  )  -  max 

N  1  \*\ 


R(p1»q1)  + 


R(p2#q2) 


♦  R(pn>Qn) 


for  N  -  1,2,...,  and  p^  6  S. 

An  application  of  the  principle  of  optimality  [l] ,  p.  8}# 
(or,  in  thla  case,  some  simple  manipulation)  yields  the  basic 
recurrence  relation 


(2) 


W 


max  [R(P1.q,)  +  fu_1(T(p1,q1 ) )  ’ , 


for  N  -  2,3#...#  with 


(3)  fl^pl^  "  IULX 

ql 

These  equations  yield  two  sequences,  the  sequence  of 
maxima,  > f ^ ( P i ) j  the  sequence  of  policy  functions. 
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{qN(p,  )\.  The  function  q^fp^)  18  the  choice  of  q^  which  Is 
made  when  the  system  Is  in  state  pt  and  there  are  N  stages 
remaining. 

4.  Discussion 

The  usual  approach  to  the  foregoing  maximization  problem 
attempts  to  determine  the  set  ‘  #<1N  !  at  one  time, 

using  variational  techniques.  In  place  of  this,  we  determine 
in  terms  of  p^  and  N,  then  q^  In  terms  of  p^  and 
N  —  1,  and  so  on.  This  is  feedback  control .  We  determine  the 
"control  vector"  q^  in  terms  of  the  current  state  of  the 
system,  p^,  and  the  duration  of  the  process,  N  -  1. 

For  deter-ilnistic  processes,  the  two  approaches  are  equiva¬ 
lent.  Por  stochastic  processes,  they  diverge  rapidly.  We  shall 
pursue  the  "feedback"  approam  since  it  is  both  easier  to  follow 
and  much  the  more  important. 

5.  Stochastic  Multistage  Decision  Processes 

Let  us  now  suppose  that  a  choice  of  q^  in  state  p^ 
yields  a  state  p2  -  T(p1,q1,r1),  where  r^  is  a  random  vector 
with  a  given  distribution  function  dOfr^).  As  above,  we  assume 
that  T(p1,q1,ri)  €  S  for  pi  c  S,  f  S'  and  r1  chosen 
from  d0( ) . 

In  place  of  the  maximization  problem  in  0^2,  we  consider  the 
problem  of  maximizing  the  expected  value  of 

(1)  PN  ••  R(p1#q1#r1)  ♦  R(p2,q2,r2)  +  ...  ♦  R(PN,qN,rV' 
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over  all  feedback  policies  *  *  ‘  ,qN  *  By  this  we  mean 

that  q^  la  chosen  with  a  knowledge  of  p^.  A°t*r  q^  is 
determined,  r^  is  obtained  from  dQ(r1),  giving  rise  to  p2- 
Bun  q2  is  selected,  with  knowledge  of  p2,  r 2  is  obtained 
from  dO(r2),  yielding  py  and  so  on. 

Introducing  the  sequence  of  functions 

(2)  fN(p, )  -  max  exp  P  , 

"  1  hi  hj  N 

N  -  1,2,...,  we  see  that 

(3)  fj/p^  ■=  oax(y/:’R(p1(q1,r1)dO(r1), 

and,  as  above, 

(*0  fN(Px)  =  [/'  (R(Pi»qi#ri)  +  f^_1(T(p1,q1,r1) )  )dQ(r1)  I 

ql 

for  N  -  2, 3#  •  • • • 

We  see  then  that  stochastic  processes  of  this  type  can  be 
treated  in  very  much  the  same  fashion  aB  the  deterministic  pro¬ 
cesses  discussed  earlier. 

6.  Prediction  and  Information  Theory 

Let  us  note  in  passing  that  these  techniques  can  be  used 
to  provide  new  approaches  to  prediction  and  information  theory, 
and  extensions  of  the  previous  results.  For  prediction  theory, 
see  Kalman  [9],  and  Bellman  [10]  j  for  lnformatlor  theory,  tee 
Btllman-Kalaba  [7] »  [ll]  ,  and  Narschak  [12]  . 
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7.  Adaptive  Processes 

We  now  wish  to  consider  processes  in  which  not  enough  is 
known  to  use  a  formulation  of  the  type  given  above.  There  are 
many  ways  of  treating  processes  of  this  type,  and  it  is  never 
clear  as  to  which  is  the  proper  way  of  doing  this.  Nor  is  it 
clear  that  this  adjective  ’proper"  has  any  meaning  in  this 
context . 

It  must  be  recognized,  however  ruefully  01  regretfully, 
that  no  definitive  theory  of  uncertainty  can  ever  exist.  The 
theories  that  are  used  will  depend  upon  the  applications  that 
are  made  and  the  personal  philosophy  of  the  user. 

We  are  thinking  of  processes  in  which 


(1) 


(a)  cause  and  effect  may  not  be  known; 

(b)  the  state  of  the  system  at  any  time  may  not  be  known; 

(c)  the  range  of  decisions  may  not  be  known; 

(d)  the  utility  functions  (e.g.  R(p,q))  may  not  be  known; 

(e)  the  duration  of  the  process  may  not  be  known; 

(f)  it  may  not  be  known  whether  deterministic  or  stochastic 
influences  are  paramount,  or  whether  the  process  is  a 
one-person  or  multl-perBon  process. 


These  are  not  problems  which  conventional  mathematical 
techniques  are  deslgnid  to  treat.  We  propose  bo  show  how  they 
can  be  precisely  formulated  and  treated  analytically  by  means  of 
the  foregoing  mathematical  apparatus,  the  functional  equation 
approach  of  dynamic  programming.  For  some  other  approaches  whi cn 
appear  promising,  see  Robbins  [lj] ,  Box  Ql4j  . 
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8.  Information  Pattern 

In  treating  proeasaaa  Involving  uncertainty,  our  hopa  is 
that  tha  multistage  natura  of  tha  situation  will  anabla  us  to 
reduoe  tha  level  of  uneertainty  stags— by-st aga .  This  idaa 
laads  to  soaa  intaraating  Ida  a*  ooncamlng  asymptotic  bahavior 
which  we  shall  discuss  balow. 

It  is  not  to  be  axpaotad  that  in  all  cases  a  simplification 
will  ansua  as  tha  process  continues.  Thera  is  little  difficulty 
in  displaying  processes  which  complicate  to  an  extraordinary 
degree  as  additional  information  is  obtained. 

Without  worrying  about  such  matters,  let  us  formulate  an 
important  type  of  adaptive  prooass .  We  follow  the  brief  sketch 
given  in  [8].  Let  tha  state  of  tha  system  S  be  specified, 
as  usual,  by  a  point  p  in  phase  space,  and  by  an  information 
pattern  P.  This  information  pattern  represents  the 
information  about  the  prooass  that  we  retain  in  order  to  deter¬ 
mine  some  of  the  properties  of  the  decision  process  which  are 
initially  unknown.  In  our  case,  let  us  assume  that  only  the 
distribution  funetlon  for  r  is  unknown.  The  simplest  infor¬ 
mation  pattern  that  one  can  think  of  in  this  case  is  the  entire 
previous  history  of  the  process.  Generally,  one  can  do  much 
better  than  this  and  substantially  compress  the  vast  amount  of 
data. 

The  state  of  the  system  is  now  specified  by  a  point  in  an 
extended  phase  space,  [p,P] .  A  choice  of  a  decision  vector 
q  results  in  a  transformation  of  p  into  T^(p,P;q,r),  and 
P  into  Tg(p,P;q,r).  Eero  r  is  a  random  vector  variable. 
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specified  by  an  a  priori  probability  distribution  dO(p,P;q,r), 
itself  a  part  of  the  information  pattern  P. 

Let  us  suppose,  for  the  sake  of  simplicity,  that  the  new 
state  p^  is  known  after  the  decision  has  been  aade  ..  Let 

the  a  priori  single  stage  return  be  ^(T^  (p,P;q,r)  ,T2  (p,P;q,r) ) . 

Then,  introducing  the  function 

(1)  fN(p,P)  -  Min  Exp  ^(PN»PN)» 

where,  as  in  the  preceding  cases,  the  minimum  is  taken  over 
feedback  oontrol  policies,  we  have  the  functional  equations 

(2)  fN(p,P)  -  Min  L/^fH_1(T1(p,P;q,r),T2(p,P;q,r)  )dO(p,P;q,r)J  , 
for  N  -  2,3,  ... ,  with 

(3)  f^p,?)  -  Min  /V(T1(p,P;q,r),T2(p,P;q,r))dO(p,P;q,r)l  . 

These  relations  can  be  used  to  establish  the  existence  of 
optical  policies  and  to  study  further  properties  of  the  multi¬ 
stage  process.  In  particular,  as  we  shall  discuss  below,  they 
can  be  used  as  a  basis  for  the  oonstruotlon  of  a  theory  of 
continuous  processes. 

9.  Sequential  Machines,  Coin— weighing  and  Search  Processes 

The  further  study  of  information  patterns  inevitably  leads 
to  a  consideration  of  sequential  machines  and  search  processes 
in  general.  As  an  illustration  of  the  way  in  whloh  the  infor¬ 
mation  can  beooae  complicated  in  an  extraordinary  fashion  as  a 
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prooess  continue,  consider  the  well-known  puzzle  of  locating 
a  defective  coin  in  a  batch  of  N  coins,  given  an  equal  arm 
balance,  and  its  extensions. 

The  initial  information  is  that  a  batch  of  N  coins  con¬ 
tains  one  defective  coin.  After  a  weighing,  involving  the 
comparison  of  two  groups  of  k  coins  chosen  from  this  original 
set  of  N  oolns,  we  know  that  the  defective  coin  is  in  on*  of 
these  two  sets  of  k  coins,  or  in  the  remaining  batch  of 
untested  M  -  2k  coins.  Thus  the  form  of  the  information 
remains  constant  with  each  succeeding  test,  or  stage  of  the 
process,  and  simplifies  to  the  point  where  we  can  eliminate  the 
original  uncertainty. 

Consider  what  happens,  however,  when  we  start  with  the 
knowledge  that  there  are  two  defective  coins.  Comparing  two 
sets  of  k  coins  each,  we  are  led  to  the  following  possibilities: 

(1)  (a)  If  the  soale  balanoes,  there  is  either  one  defective 

coin  in  each  of  the  k-eets,  or  none,  which  meang 
that  there  are  two  defectives  in  the  remaining 
N  —  2k  coins  . 

(b)  If  the  scale  unbalances,  there  is  either  one  defective 
coin  or  two  defective  coins  in  one  of  the  k-eets,  and 
either  one  or  none  left  in  the  remaining  N  —  2k  coins. 

It  <s  easy  to  see  that  as  the  testing  process  continues, 
the  information  pattern  increases  in  size  and  in  complexity. 

If  we  allow  perfectly  general  testing  policies  which  admit  the 
mixing  of  different  batches,  it  appears  to  be  hopelesj  to 
attempt  to  keep  traok  of  the  process . 


P-1991 

12 


Problems  of  this  nature  are  of  great  practical  importance 
and  extremely  difficult  to  handle  by  meant  of  analytic  tech¬ 
niques.  for  some  preliminary  work  on  the  foregoing  problem  by 
meant  of  functional  equation  techniques,  tee  Bellman  [15], 

Bellman  and  Gluss  [1<  | ,  Calms  [If]  . 

For  a  dltouttlon  of  problems  of  related  nature,  see 
H.  Sobel  and  P.  A.  droll  [16] ,  R .  Dorfman  [19]  »  and  P.  Ungar  [20]  . 

10.  Continuous  Adaptive  Prooesaea 

A  simple  and  conceptually  important  way  to  found  a  theory 
of  continuous  processes  of  any  type  is  by  a  paasage  to  the 
limit  in  a  theory  of  discrete  processes.  In  some  situations, 
it  is  not  difficult  to  construct  a  theory  of  the  continuous 
prooess  directly.  In  these  cases,  it  is  essential  to  establish 
the  equivalence  of  the  two  approaches .  Many  theorems  of  this 
type  exist  in  connection  with  the  study  of  differential  and 
difference  equations,  in  the  field  of  partial  differential  and 
difference  equations,  and  in  the  theory  of  probability. 

In  some  fields,  only  reoently  developed,  the  continuous 
theories  do  not  exist  and  seem  quite  difficult  to  formulate. 

For  these,  a  passage  from  the  discrete  to  the  continuous  seems 
to  be  the  easiest  and  safest  approach. 

One  advantage  of  using  the  passage  to  the  limit  approach 
lies  in  the  fact  that  we  can  in  many  cases  establish  the 
existence  of  a  limiting  continuous  prooess  under  conditions 
fhloh  are  far  weaker  than  those  necessary  to  impose  in  order 
to  guarantee  the  existence  of  a  continuous  process  constructed 
directly. 
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In  order  to  illustrate  these  comments  which  at  first  may 
not  seem  reasonable,  let  us  consider  a  problem  in  the  calculus 
of  variations. 

Suppose  that  we  wish  to  determine  the  minimum  of  the 
functional 

(1)  J(u)  -,/91  g (u,u'  )dt 

over  all  funotions  u(t)  satisfying  the  initial  condition 
u(0)  ■  c.  This  problem  is  far  more  complex  than  it  may  seem 
at  first  glanoe.  In  the  first  place,  to  assure  that  an  actual 
minimum,  rather  than  an  infimum,  exists,  strong  conditions  must 
be  imposed  upon  the  function  g(u,u').  Secondly,  the  standard 
variational  technique,  which  leads  to  the  Euler  equation, 
possesses  many  drawbacks;  see  [2]  for  a  detailed  discussion. 

We  have  then  a  situation  in  which  it  is  not  easy  to 
establish  the  existence  of  a  solution,  and  not  easy  to  obtain 
the  solution  onoe  the  existence  has  been  established. 

Consider,  however,  a  discrete  version  of  the  foregoing 
problem.  Suppose  that  we  wish  to  minimize 

(2)  J*  ■  uo  -  °> 

where  u^^  ■  u^  ♦  v^A.  Very  mild  conditions  upon  the  function 
g  will  enable  us  to  assert  the  existence  of  an  attained  mini¬ 
mum.  furthermore,  if  we  allow  u  and  v  to  assu me  only  a 
finite  set  of  values,  all  we  ask  is  that  the  function  g(u,v) 
be  defined  for  the  allowable  values  of  u  and  v. 
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'Ihe  recurrence  relation 
(3)  f„(c)  *  ^  [g(e,v)A  ♦  f^o  ♦  vA)] 

yields  a  oonntruotlva  way  of  obtaining  tha  daalrad  minimum 
valua . 

Tha  quastion  naturally  ariaaa  aa  to  tha  relation  between 
tha  disorata  and  continuous  varoions  of  this  multistage 
daclslon  process.  We  suspect  that  as  A  — 0  thv  discrete 
procass  will  converge  to  the  continuous  process — if  we  impose 
sufficient  regularity  conditions  upon  g(u,u')* 

This  is  indeed  tha  case.  It  is  quite  easy  to  show  that 
tha  conditions  usually  imposed  upon  g(ufu' )  to  guarantee’ the 
existence  of  a  solution  are  strong  enough  to  yield  the  desired 
limiting  behavior.  See  tha  proof  by  Pleming  in  [  x]  . 

Tha  more  Interesting  problem  is  to  determine  conditions 
upon  g(u,u‘  )  which  will  guarantee  that  the  limit  of  tha 
disorata  process  exists  as  A  — »  0.  Wa  can  than  define  a 
continuous  procass ,  not  directly  by  way  of  (l),  but  in  this 
fashion. 

It  turns  out  that  this  program  can  be  carried  out.  In 
[2l3  it  was  shown  that  using  only  tha  recurrence  relation  of 
(3)  and  Imposing  upon  g(u,u'  )  conditions  which  are  far 
weaker  than  those  required  in  tha  classical  theory,  tha 
axlstanoa  of  a  limit  for  fN(o)  as  A  0  can  be  established. 
It  follows  that  wa  have  a  ooneept  of  a  continuous  variational 
prooesa  which  generalizes  that  of  tha  classical  version. 
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Perhaps  the  most  important  aspect  of  this  approaoh  is 
that  it  snablss  us  to  introduce  the  idea  of  a  continuous  pro¬ 
cess  in  situations  in  which  no  classical  theory  exists.  In 
[3]  we  dlsoussed  this  for  stochastic  variational  processes. 

It  is  dear  that  we  can  in  a  similar  fashion  build  upon  the 
formulation  of  discrete  adaptive  processes  we  have  given  in 
the  preceding  pages  to  formulate  a  theory  of  continuous  varia¬ 
tional  processes  of  adaptive  type.  Similarly,  we  can  construct 
a  theory  of  continuous  multistage  games,  of  ordinary  stochastic 
or  adaptive  type.  See  [l]  for  the  formulation  of  the  discrete 
multistage  version. 

The  problems  of  convergence  of  the  return  function  and  of 
the  optimal  policies  are  quite  complex.  They  require  a  blend 
of  olasaloal  analysis  and  probability  theory  which  has  not 
heretofore  existed. 

11.  Reduction  of  Dimensionality  and  Sufficient  Statistics 

Vhe  functional  equations  we  derived  to  treat  adaptive 
control  processes  Involve,  in  many  cases,  functions  of 
functions.  Although  these  functions  can  be  used  to  establish 
the  existence  of  optimal  policies,  they  are  not  well  suited  to 
analytic  investigation  nor  to  computational  work. 

In  order  to  obtain  analytlo  and  numerical  results,  it  is 
essential  that  we  reduce  these  functions  of  functions  to 
ordinary  functions.  In  many  oases,  we  can  perform  this 
reduction  by  using  the  ooncept  of  sufficient  statistics.  This 
idea  enables  us  to  reduce  the  information  pattern  'rom  a  set 
of  functions  to  an  ordinary  vector. 
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As  an  example  of  this,  consider  a  process  in  which  a 
certain  random  variable  r  assumes  only  the  two  values  0 
and  1  with  unknown  probabilities  of  respectively  1  -  p 
and  p.  After  the  process  has  continued  for  M  stages,  we 
have  acquired  an  information  pattern  [0,0, 1 ,1 , 1 , . . . ,Q,l] 

consisting  of  the  values  assumed  by  r  over  the  preceding  N 
trials . 

In  place  of  this  set  of  values  which  increases  in  size  as 
the  process  continues,  we  can  often  use  merely  the  number  of 
l's  and  the  number  of  G's  which  have  been  tabulated.  In 
these  cases,  the  order  of  occurrence  is  of  no  importance. 

In  place  of  a  function  fN(c;S)  where 
3  *  (0,0,1, 1,1, . . . ,0,l]  ,  we  will  now  have  a  function 
fj|(c,m,n),  where  m  is  the  number  of  l's  and  n  is  the 
number  of  Q'sj  see  Bellman  [22]  ,  Bellman  and  Calaba  Q4^)  , 
Preimer  [2j>] ,  [24],  for  applications  of  this  id*a.  Clearly, 
this  technique  can  be  used  in  many  ways . 

One  technique  which  has  not  been  Investigated  as  yet  is 

that  of  "asymptotic  sufficient  statistics."  Perhaps  the  best 

example  of  this  is  the  central  limit  theorem.  If  the  random 

variables  x.  are  drawn  from  an  unknown  distribution,  and  if 
1  N 

it  is  desired  to  determine  the  distribution  of  •  ^x^, 

we  know  that  for  large  N  it  is  sufficient  to  tabulate  merely 

N  N  2 

the  two  sums  5  x. ,  S  • 

1-1  1  i-1  1 

If,  as  in  many  oases,  we  are  interested  only  in  steady- 
state  policies,  whioh  is  to  say  asymptotic  policies,  results 
of  this  type  will  enable  us  to  reduce  the  dimension  of  the 
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problem  greatly.  Other  techniques  for  reduction  of  dimension¬ 
ality  will  be  found  in  Bellman  [25]  ,  Beckwith  [26]  ,  Bellman 
and  Kalaba  [2  f)  ,  Bellman  and  Dreyfus  [26]  . 

12.  Linear  Equations  and  Quadratic  Criteria — 1 

In  view  of  the  analytic  complexity  of  the  general  problem, 
and  with  application  of  the  method  of  successive  approximations 
in  mind,  it  is  worthwhile  to  consider  processes  governed  by 
linear  equations  and  quadratic  criteria. 

Let  us  consider  first  the  scalar  case.  Write 


(1) 


n+1 


+  r„,  un  -  c. 


n 


and  suppose  that  the  vi  are  to  be  chosen  so  as  to  minimize 
the  expected  value  of  the  quadratic  form 


(2) 


J»  *  „!o(U»2  +  *■“> 


Consider  first  the  stochastic  case  where  the  r^  are 
Independent  random  variables  with  known  distributions,  which 
for  simplicity  of  notation  we  shall  take  to  be  same. 

Writing 

0)  fM(e)  -  Min  Kxp 

"  v  r 


it  is  easy  to  see  that 
(4)  f0(e)  -  o2, 

and 
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(5)  fM(e)  -  *in  [c2  +  ^02  ♦  top  fM_1(ac  +  vQ  +  rQ)J 

v0  r0 

-  Min  [c2  +  ?\v02  ♦  ^f^fac  ♦  vQ  ♦  r)dO(r)J 


It  it  easy  to  show  inductively  that  f^(o)  it  a  quad¬ 
ratic  in  c,  i.e.. 


(6) 


rH(c)  -  ^  +  vNc  +  Wkc  ,  N  -  0,1,2,..., 


where  u^,  and  w^  are  independent  of  c.  Uilng  this 
representation  for  the  functions  fN(c),  w«  readily  obtain 
representations  for  u^,  and  w^  in  terms  of  u^  ,  v^^ 
and  Wj^;  see  Bellman  [29]  ,  Kramer  [^Ol  ,  ifeckwlth  [ 26]  , 
Frelmer  [23]  ,  Adorno  [}\]  . 

These  results  oan  now  be  used  for  computational  purposes 
and  to  study  the  asymptotic  behavior  of  return  functions  and 
optimal  policies  as  N  -►  00  . 


I?.  Linear  liquations  and  Quadratic  Criteria — II 

Let  us  now  consider  an  adaptive  version.  Suppose  that 
|rnj  is  a  sequence  of  random  variables  with  probability  p 
of  assuming  the  value  1  and  1  -  p  of  0.  It  is  clear  that 
we  can  use  the  idea  of  sufficient  statistics.  Let 


(1)  fN(c,m,n)  "  Min  Kxp 


where  m  1-values  and  n  0-values  have  been  observed  for  the 
r^  over  the  past  a  ♦  n  stages. 
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Let  dQ(p)  be  an  initial  a  priori  distribution  for  p 
and  supposa  that  it  is  agreed  to  use  the  following  transfor¬ 
mations  : 


The  result  is  that  aftsr  m  +  n  trials  with  m  l's  and  n 
r ' b ,  we  have  as  the  new  a  priori  distribution 


We  use  as  an  estimate  for  p  for  the  next  stage  the  value 


Kenee,  the  functional  equation  for  fN(c,m,n)  is 

(5)  fM(c,m,n)  -  Kin  [c2  ♦  AvQ2  ♦  PB,nfN-l^ac  4  v0  4  1 ) 

v0 

4  -  ^,n)fK-l(ac  4  V0^' 

As  above,  we  oan  use  the  structural  relation  of  (12. o)  to 
simplify  tils  relation. 
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14 .  Discussion 

The  problem  of  determining  the  Asymptotic  behavior  of 
fg(c,m,n)  as  N  — *  oo  has  many  complex  features.  It  is  to 
be  expected  that  in  one  sense  or  another,  fj|(c,m,n)  ~  fN(c,pQ) 
as  N  — ►  oo,  where  pQ  is  the  actual  value  of  p.  A  parti¬ 
cular  version  of  this  problem  has  been  attacked  by  Adorno  Qjl^)  . 

13.  Open  Problems 

In  the  foregoing  sections  we  have  indicated  how  a  theory 
of  adaptive  control  processes  can  be  constructed.  Associated 
with  this  approach,  there  are  any  number  of  analytic  problems 
which  we  have  explicitly  or  tacitly  raised.  Some  of  these  are 

(1)  (a)  Is  the  set  of  transformations  in  (13-2)  the  "best" 

way  to  modify  a  priori  information? 

(b)  Is  the  estimate  of  (lj».4)  the  "bast"  estimate  for  p? 

(c)  Asymptotically,  does  it  make  much  difference  what 
transformations  we  employ  from  stage  to  stage,  and 
what  a  priori  information  we  assume? 

The  analytic  difficulties  in  this  field  are  great,  but 
the  conceptual  difficult les  are  greater.  It  seems  reasonable 
to  believe  that  there  never  will  be  definitive  theories  in 
this  area,  nor  is  it  clear  that  the  word  "optimal"  has  an 
absolute  meaning.  Ve  can  summarize  the  situation  simply  by 
saying  that  all  of  the  philosophical  paradoxes  of  statistics 
and  game  theory  are  present,  with  their  cousins  and  their 


sitter*  and  their  aunts. 
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